Object Detection


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. 2D Convolution

tf.keras.layers.Conv2D(filters, kernel_size, strides, padding, activation, kernel_regularizer, input_shape)
    filters = 32 
    kernel_size = (3,3)
    strides = (1,1)
    padding = 'SAME'
    activeation='relu'
    kernel_regularizer=tf.keras.regularizers.l2(0.04)
    input_shape = tensor of shape([input_h, input_w, input_ch])


  • filter size
    • the number of channels.
  • kernel_size

    • the height and width of the 2D convolution window.
  • stride

    • the step size of the kernel when traversing the image.
  • padding

    • how the border of a sample is handled.
    • A padded convolution will keep the spatial output dimensions equal to the input, whereas unpadded convolutions will crop away some of the borders if the kernel is larger than 1.
    • 'SAME' : enable zero padding
    • 'VALID' : disable zero padding
  • activation
    • Activation function to use.
  • kernel_regularizer

    • Initializer for the kernel weights matrix.
  • input and output channels

    • A convolutional layer takes a certain number of input channels ($C$) and calculates a specific number of output channels ($D$).

Examples


input = [None, 4, 4, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'VALID'

input = [None, 5, 5, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'SAME'

2. Object Detection



2.1. Localization Methods

  • Histogram of oriented Gradients (HOG) with SVM


  • Selective search


3. Object Detection Algorithms



3.1. One-stage Object Detection

  • YOLO



  • SSD



3.2. Two-stage Object Detection

  • R-CNN

  • Faster R-CNN

  • Mask R-CNN

4. Examples

In [1]:
%%html
<center><iframe 
width="560" height="315" src="https://www.youtube.com/embed/Cgxsv1riJhI" frameborder="0" allowfullscreen>
</iframe><center>
In [2]:
%%html
<center><iframe 
width="560" height="315" src="https://www.youtube.com/embed/vRqSO6RsptU" frameborder="0">
</iframe><center>

5. Object Detection with Machinery Parts Dataset

  • Simplified version of two-stage object detection model for tutorial



  • 2-D convolution layers extract features from the input image.
  • Extracted features are utilized for object bounding box detection and object classification
  • Both classifier and bounding box regressor share the same features acuired from the 2-D convolution layers

5.1. Import Library

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

%matplotlib inline 
In [2]:
train_imgs = np.load('data_files/object_detction_trn_data.npy')
train_labels = np.load('data_files/object_detction_trn_label.npy')

test_imgs = np.load('data_files/object_detction_eval_data.npy')
test_labels = np.load('data_files/object_detction_eval_label.npy')

# input image: 240 by 320
# output label: class, x, y, h, w

classes = ['Axis',
           'Bearing',
           'Bearing_Box',
           'Distance_Tube',
           'F20_20_B']
In [3]:
print(train_imgs.shape)
print(train_labels.shape)
print(test_imgs.shape)
print(test_labels.shape)
(250, 240, 320, 3)
(250, 5)
(50, 240, 320, 3)
(50, 5)
  • Five classes images are availabe: Axis, bearing, bearing box, distance tube, beam

  • 250 images are used for training (50 images per class)

  • 50 images are avalialbe for evaluation (10 images per class)

  • One object per image (240 by 320)

  • Labeled with class and bounding box location(normalizsed): class, $x, y, h, \omega$

In [4]:
idx = 138

train_img = train_imgs[idx]
c, x, y, h, w = train_labels[idx]

# rescaling 
x, w = x*320, w*320
y, h = y*240, h*240

rect = patches.Rectangle((x, y), 
                         w,
                         h, 
                         linewidth = 2, 
                         edgecolor = 'r', 
                         facecolor = 'none')

fig, ax = plt.subplots(figsize = (8,8))
plt.title(classes[int(c)])
plt.imshow(train_img)
ax.add_patch(rect)
plt.axis('off')
plt.show()
In [5]:
# rescaling output labels

train_labels = np.multiply(train_labels, [1, 320, 240, 320, 240])
test_labels = np.multiply(test_labels, [1, 320, 240, 320, 240])

5.3. Define and Build an Object Detection Model




In [6]:
feature_extractor = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (240, 320, 3)),
    
    tf.keras.layers.MaxPool2D(pool_size = (2,2)),
    
    tf.keras.layers.Conv2D(64, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(64, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(256, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.GlobalAveragePooling2D()
])
In [7]:
classifier = tf.keras.layers.Dense(256, activation = 'relu')(feature_extractor.output)
classifier = tf.keras.layers.Dense(256, activation = 'relu')(classifier)
classifier = tf.keras.layers.Dense(5, activation = 'softmax', name = 'cls')(classifier)
In [8]:
bb_regressor = tf.keras.layers.Dense(256, activation = 'relu')(feature_extractor.output)
bb_regressor = tf.keras.layers.Dense(256, activation = 'relu')(bb_regressor)
bb_regressor = tf.keras.layers.Dense(4, name = 'bbox')(bb_regressor)
In [9]:
object_detection = tf.keras.models.Model(inputs = feature_extractor.input, 
                                         outputs = [classifier, bb_regressor])
In [10]:
object_detection.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
conv2d_input (InputLayer)       [(None, 240, 320, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 240, 320, 32) 896         conv2d_input[0][0]               
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 120, 160, 32) 0           conv2d[0][0]                     
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 120, 160, 64) 18496       max_pooling2d[0][0]              
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 60, 80, 64)   0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 60, 80, 64)   36928       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 30, 40, 64)   0           conv2d_2[0][0]                   
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 30, 40, 128)  73856       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 15, 20, 128)  0           conv2d_3[0][0]                   
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 15, 20, 128)  147584      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 7, 10, 128)   0           conv2d_4[0][0]                   
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 7, 10, 256)   295168      max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
global_average_pooling2d (Globa (None, 256)          0           conv2d_5[0][0]                   
__________________________________________________________________________________________________
dense (Dense)                   (None, 256)          65792       global_average_pooling2d[0][0]   
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 256)          65792       global_average_pooling2d[0][0]   
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          65792       dense[0][0]                      
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 256)          65792       dense_2[0][0]                    
__________________________________________________________________________________________________
cls (Dense)                     (None, 5)            1285        dense_1[0][0]                    
__________________________________________________________________________________________________
bbox (Dense)                    (None, 4)            1028        dense_3[0][0]                    
==================================================================================================
Total params: 838,409
Trainable params: 838,409
Non-trainable params: 0
__________________________________________________________________________________________________

5.4. Define Losses and Optimization Configuration

In [11]:
object_detection.compile(optimizer = 'adam', 
                         loss = {'cls': 'sparse_categorical_crossentropy', 
                                 'bbox': 'mse'}, 
                         loss_weights = {'cls': 1, 
                                         'bbox': 1})
In [12]:
# divide labels to cls and bbox labels

train_cls = train_labels[:,:1]
train_bbox = train_labels[:,1:]

print(train_labels.shape)
print(train_cls.shape)
print(train_bbox.shape)
(250, 5)
(250, 1)
(250, 4)
In [13]:
object_detection.fit(x = train_imgs, 
                     y = {'cls': train_cls, 'bbox': train_bbox}, 
                     epochs = 40)
Epoch 1/40
8/8 [==============================] - 9s 1s/step - loss: 3644.4502 - cls_loss: 1.8299 - bbox_loss: 3642.6204
Epoch 2/40
8/8 [==============================] - 9s 1s/step - loss: 1451.5925 - cls_loss: 2.1520 - bbox_loss: 1449.4406
Epoch 3/40
8/8 [==============================] - 9s 1s/step - loss: 688.2375 - cls_loss: 1.9426 - bbox_loss: 686.2950
Epoch 4/40
8/8 [==============================] - 9s 1s/step - loss: 591.7714 - cls_loss: 1.8022 - bbox_loss: 589.9692
Epoch 5/40
8/8 [==============================] - 9s 1s/step - loss: 477.0511 - cls_loss: 1.9062 - bbox_loss: 475.1448
Epoch 6/40
8/8 [==============================] - 9s 1s/step - loss: 379.4891 - cls_loss: 1.8168 - bbox_loss: 377.6722
Epoch 7/40
8/8 [==============================] - 9s 1s/step - loss: 362.8150 - cls_loss: 1.7573 - bbox_loss: 361.0576
Epoch 8/40
8/8 [==============================] - 9s 1s/step - loss: 346.4457 - cls_loss: 1.7708 - bbox_loss: 344.6750
Epoch 9/40
8/8 [==============================] - 9s 1s/step - loss: 340.9023 - cls_loss: 1.8372 - bbox_loss: 339.0651
Epoch 10/40
8/8 [==============================] - 9s 1s/step - loss: 342.8036 - cls_loss: 1.7379 - bbox_loss: 341.0658
Epoch 11/40
8/8 [==============================] - 9s 1s/step - loss: 330.0833 - cls_loss: 1.6797 - bbox_loss: 328.4036
Epoch 12/40
8/8 [==============================] - 9s 1s/step - loss: 327.4032 - cls_loss: 1.7265 - bbox_loss: 325.6768
Epoch 13/40
8/8 [==============================] - 9s 1s/step - loss: 331.3055 - cls_loss: 1.7154 - bbox_loss: 329.5900
Epoch 14/40
8/8 [==============================] - 9s 1s/step - loss: 330.2453 - cls_loss: 1.7126 - bbox_loss: 328.5327
Epoch 15/40
8/8 [==============================] - 9s 1s/step - loss: 326.3702 - cls_loss: 1.7452 - bbox_loss: 324.6250
Epoch 16/40
8/8 [==============================] - 9s 1s/step - loss: 330.0847 - cls_loss: 1.7089 - bbox_loss: 328.3758
Epoch 17/40
8/8 [==============================] - 9s 1s/step - loss: 320.3557 - cls_loss: 1.6623 - bbox_loss: 318.6934
Epoch 18/40
8/8 [==============================] - 9s 1s/step - loss: 318.2230 - cls_loss: 1.6441 - bbox_loss: 316.5789
Epoch 19/40
8/8 [==============================] - 9s 1s/step - loss: 328.5690 - cls_loss: 1.6722 - bbox_loss: 326.8968
Epoch 20/40
8/8 [==============================] - 9s 1s/step - loss: 304.0924 - cls_loss: 1.7069 - bbox_loss: 302.3856
Epoch 21/40
8/8 [==============================] - 9s 1s/step - loss: 283.3036 - cls_loss: 1.6912 - bbox_loss: 281.6125
Epoch 22/40
8/8 [==============================] - 9s 1s/step - loss: 279.9888 - cls_loss: 1.6705 - bbox_loss: 278.3183
Epoch 23/40
8/8 [==============================] - 9s 1s/step - loss: 262.5784 - cls_loss: 1.6733 - bbox_loss: 260.9051
Epoch 24/40
8/8 [==============================] - 9s 1s/step - loss: 265.4927 - cls_loss: 1.6547 - bbox_loss: 263.8381
Epoch 25/40
8/8 [==============================] - 9s 1s/step - loss: 263.9696 - cls_loss: 1.8092 - bbox_loss: 262.1604
Epoch 26/40
8/8 [==============================] - 9s 1s/step - loss: 280.4554 - cls_loss: 1.6892 - bbox_loss: 278.7661
Epoch 27/40
8/8 [==============================] - 9s 1s/step - loss: 288.6752 - cls_loss: 1.6881 - bbox_loss: 286.9871
Epoch 28/40
8/8 [==============================] - 9s 1s/step - loss: 265.5946 - cls_loss: 1.6290 - bbox_loss: 263.9656
Epoch 29/40
8/8 [==============================] - 9s 1s/step - loss: 228.8832 - cls_loss: 1.6046 - bbox_loss: 227.2786
Epoch 30/40
8/8 [==============================] - 9s 1s/step - loss: 215.6043 - cls_loss: 1.6653 - bbox_loss: 213.9391
Epoch 31/40
8/8 [==============================] - 9s 1s/step - loss: 215.9343 - cls_loss: 1.6319 - bbox_loss: 214.3024
Epoch 32/40
8/8 [==============================] - 9s 1s/step - loss: 235.0270 - cls_loss: 1.6134 - bbox_loss: 233.4136
Epoch 33/40
8/8 [==============================] - 9s 1s/step - loss: 225.0281 - cls_loss: 1.6012 - bbox_loss: 223.4269
Epoch 34/40
8/8 [==============================] - 9s 1s/step - loss: 214.5673 - cls_loss: 1.6060 - bbox_loss: 212.9613
Epoch 35/40
8/8 [==============================] - 8s 1s/step - loss: 213.7005 - cls_loss: 1.5971 - bbox_loss: 212.1035
Epoch 36/40
8/8 [==============================] - 9s 1s/step - loss: 211.2743 - cls_loss: 1.5600 - bbox_loss: 209.7142
Epoch 37/40
8/8 [==============================] - 9s 1s/step - loss: 217.9018 - cls_loss: 1.5162 - bbox_loss: 216.3856
Epoch 38/40
8/8 [==============================] - 9s 1s/step - loss: 228.8684 - cls_loss: 1.5275 - bbox_loss: 227.3409
Epoch 39/40
8/8 [==============================] - 9s 1s/step - loss: 227.8771 - cls_loss: 1.4684 - bbox_loss: 226.4087
Epoch 40/40
8/8 [==============================] - 9s 1s/step - loss: 190.7367 - cls_loss: 1.3554 - bbox_loss: 189.3813
Out[13]:
<tensorflow.python.keras.callbacks.History at 0x18db6a53208>

5.5. Results for Training Sets

In [18]:
idx = 110

# true label
c_label, x_label, y_label, h_label, w_label = train_labels[idx]

rect_label = patches.Rectangle((x_label, y_label),
                               w_label,
                               h_label,
                               linewidth = 2,
                               edgecolor = 'r',
                               facecolor = 'none')

# predict
c_pred, bbox = object_detection(train_imgs[[idx]])

x, y, h, w = bbox[0]
rect = patches.Rectangle((x, y),
                         w,
                         h,
                         linewidth = 2,
                         edgecolor = 'b',
                         facecolor = 'none')
In [20]:
print(classes[int(c_label)])
print(classes[np.argmax(c_pred)])
Bearing_Box
F20_20_B
In [21]:
fig, ax = plt.subplots(figsize = (8,8))
plt.imshow(train_imgs[idx])
ax.add_patch(rect_label)
ax.add_patch(rect)
plt.axis('off')
plt.show()

5.6. Results for Testing Sets

In [23]:
idx = 40

# true label
c_label, x_label, y_label, h_label, w_label = test_labels[idx]

rect_label = patches.Rectangle((x_label, y_label),
                               w_label,
                               h_label,
                               linewidth = 2,
                               edgecolor = 'r',
                               facecolor = 'none')

# predict
c_pred, bbox = object_detection(test_imgs[[idx]])

x, y, h, w = bbox[0]
rect = patches.Rectangle((x, y),
                         w,
                         h,
                         linewidth = 2,
                         edgecolor = 'b',
                         facecolor = 'none')
In [24]:
print(classes[int(c_label)])
print(classes[np.argmax(c_pred)])
F20_20_B
F20_20_B
In [25]:
fig, ax = plt.subplots(figsize = (8,8))
plt.imshow(test_imgs[idx])
ax.add_patch(rect_label)
ax.add_patch(rect)
plt.axis('off')
plt.show()
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')